Zip Code Data

Read in the previously processed/cleaned zip code data in Stata’s format, and begin to assign exposure peiod variables by race.

foo <- read.dta("~/Wildfires_kurt/race eth of asthma ED visits by zip day_exp merged.dta") %>% mutate(wildfire = ifelse(pm_25 > 0, "WFperiod", "nonWFperiod")) %>%
  rename(zip = bene_addr_zip)

Wildfire by day for different zips codes

First we can look at the concentrations in each zip code and day.

Next we’ll classify each day-zip combination as experiencing some wildfire exposure or not. Wildfire period is defined as any day where the zip code has a PM2.5 value greater than zero (note: some are very low).

Calculating RR

Now we can begin to calculate RR, first for the entire zip codes (not race stratified). We’ll use a constant of o.5 to fill in spots where there are no cases during the exposure or control periods.

constant <- 0.5

rates_overall <- foo %>%
  mutate(total = white + hispanic + black + native_am + asian_pi + other + missing) %>%
  group_by(zip, wildfire) %>% 
  summarise(cases = sum(total),
            days = length(svc_from_dt)) %>%
    mutate(casesPerDay =  cases/days)

RR_overall <- foo %>%
  mutate(total = white + hispanic + black + native_am + asian_pi + other + missing) %>%
  group_by(zip, wildfire) %>% 
  summarise(cases = sum(total),
            days = length(svc_from_dt)) %>%
    mutate(cases = ifelse(cases == 0, constant, cases),
      casesPerDay = cases/days) %>%
    select(zip, wildfire, casesPerDay) %>%
    spread(key = wildfire, value = casesPerDay) %>%
    mutate(RateRatio = WFperiod/nonWFperiod,
           ln_RateRatio = log(RateRatio)) %>%
  rename(WF_rate = WFperiod, 
         nonWF_rate = nonWFperiod)

DT::datatable({rates_overall}) %>% DT::formatRound(5,5) 
DT::datatable({RR_overall}) %>% DT::formatRound(c(2:5),3)

Plot Overall rate ratios

## Calculating race-specific RR

Now we can begin to calculate race-specific RR.

RR_race <- foo %>%
  group_by(zip, wildfire) %>% 
  summarise(white = sum(white),
            hispanic = sum(hispanic),
            black = sum(black),
            native_am = sum(native_am),
            asian_pi = sum(asian_pi),
            other = sum(other),
            days = length(svc_from_dt)) %>%
  gather(white, hispanic, black, native_am, asian_pi,other, key = race, value = cases) %>%
    mutate(cases = ifelse(cases == 0, constant, cases),
           casesPerDay = cases/days) %>%
    select(zip, wildfire, casesPerDay, race) %>%
    spread(key = wildfire, value = casesPerDay) %>%
    mutate(RateRatio = WFperiod/nonWFperiod,
           ln_RateRatio = log(WFperiod/nonWFperiod))
  


rates_overall <- foo %>%
  mutate(total = white + hispanic + black + native_am + asian_pi + other + missing) %>%
  group_by(zip, wildfire) %>% 
  summarise(cases = sum(total),
            days = length(svc_from_dt)) %>%
    mutate(casesPerDay =  cases/days)



DT::datatable({RR_race}) %>% DT::formatRound(c(3:6),3)

Look at the distrubutions of race_specific rate ratios, need to look into why so many of these are focused around 1.5…?